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Abstract 

MicroRNAs (miRNAs) are a class of small noncoding RNAs that can regulate many genes by base pairing to sites in mRNAs. 
The functionality of miRNAs overlaps that of short interfering RNAs (siRNAs), and many features of miRNA targeting have been 
revealed experimentally by studying miRNA-mimicking siRNAs. This review outlines the features associated with animal miRNA 
targeting and describes currently available prediction tools. 


1. Introduction 


MicroRNAs (miRNAs) were identified as a large sub-class of 
ncRNAs in 2001. Since then, an increasing number of studies 
have firmly established miRNAs importance in gene regulation 
in general and animal development and disease in particular [1- 
5] . miRNAs regulate protein-coding genes post transcription by 
guiding a protein complex known as the RNA-induced silenc¬ 
ing complex (RISC) to messenger RNAs (mRNAs) with partial 
complementarity to the miRNA [6] . Through mechanisms not 
completely understood, RISC then inhibits protein translation 
and causes mRNA degradation [7, 8]. Current estimates indi¬ 
cate that miRNAs regulate at least 60% of the human protein¬ 
coding genes through this post-transcriptional gene silencing 
(PTGS) [9]. 

Incorporated into RISC, miRNAs are functionally equivalent 
to short interfering RNAs (siRNAs) [10, 11]. The main differ¬ 
ence between these RNAs is that miRNAs are processed from 
imperfect hairpin structures, whereas siRNAs are processed 
from long double-stranded RNAs [12, 13]. Moreover, animal 
miRNAs typically target imperfect sites, whereas siRNAs tar¬ 
get sites with near-perfect complementarity. SiRNAs do target 
imperfect sites as well, however, and this miRNA-like targeting 
is the major source of siRNA off-target effects [14-16]. 

The list of known miRNAs is large and increasing. Currently, 
the official miRNA database miRBase lists 721 human miRNAs 
(http://www.mirbase.org; Release 14) [17], but estimates 
indicate that the human genome contains more than 1000 miR¬ 
NAs. As only a few regulatory targets are known, predicting 
and validating miRNA targets is one of the major hurdles in un¬ 
derstanding miRNA biology. Here, we review the features im¬ 
portant for miRNA targeting and the bioinformatics tools avail¬ 
able for predicting miRNA targets. 
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2. miRNA target features 

Identifying miRNA targets in animals has been very chal¬ 
lenging. This is mainly because the limited complementarity 
between miRNAs and their targets, which might lead to the 
finding of hundreds of potential miRNA targets per miRNA. 
Therefore, many studies have been conducted, both experimen¬ 
tally and computationally, to reveal more efficient approaches 
for miRNA target recognition. We have divided the miRNA 
target features reported in these studies into six categories, 
miRNA:mRNA pairing, site location, conservation, site acces¬ 
sibility, multiple sites and expression profile. 

2.7. miRNAimRNA pairing: ‘Seed site’ is the most important 
feature for target recognition 

miRNA targets commonly have at least one region that has 
Watson-Crick pairing to the 5' part of miRNA. This 5' part, 
located at positions 2-7 from the 5' end of miRNA, is known 
as the ‘seed’, as RISC uses these positions as a nucleation sig¬ 
nal for recognizing target mRNAs [ 1 8-20] . The corresponding 
sites in mRNA are referred to as ‘seed sites’. A stringent-seed 
site has perfect Watson-Crick pairing and can be divided into 
four ‘seed’ types - 8mer, 7mer-m8, 7mer-Al and 6mer - de¬ 
pending on the combination of the nucleotide of position 1 and 
pairing at position 8 (Fig. 1 a). 8mer has both an adenine at posi¬ 
tion 1 of the target site and base pairing at position 8. 7mer-Al 
has an adenine at position 1, while 7mer-m8 has base pairing 
at position 8. 6mer has neither an adenine at position 1 nor 
base pairing at position 8 [21]. An adenine on the target site 
corresponding to position 1 of miRNA is known to increase ef¬ 
ficiency of target recognition [22] . 

In addition to this stringent-seed matching, moderate- 
stringent-seed matching is also functional because RISC can 
tolerate small mismatches or G:U wobble pairing within the 
seed region (Fig. lb). This moderate-stringent-seed matching 
has five ‘seed’ types: GUM, GUT, BM, BT and LP, defined 
regarding to the mismatch type. GUM has one G:U wobble 
and the uracil on the seed site of miRNA, whereas GUT has 
the uracil on the target site of mRNA. BM has one bulge and 
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Figure 1 : Types of miRNA target sites and multiple sites, (a) Stringent-seed 
site. 7mer-Al is shown as an example. Vertical lines indicate Watson-Crick 
paring, (b) Moderate-stringent-seed site. BM is shown as an example, (c) 
3'-supplementary site. More than 3-4 nucleotide paring required, (d) Optimal 
distance of two miRNA target sites. 


the mismatch is on the seed site, whereas BT has the mismatch 
on the target site. LP has only one loop [23]. Furthermore, 
RISC can recognize offset sites that are located at positions 3- 
10. Offset sites can be either stringent or moderate-stringent- 
seed matching [24] . 

Watson-Crick pairing in the 3' part of miRNA is known to 
enhance the site recognition efficacy in miRNA targets that 
have seed pairing [21]. The preferable nucleotide number of 
matches in the 3' part differs between the site that has stringent- 
seed pairing and the one that has moderate-stringent-seed pair¬ 
ing. Stringent-seeds require 3-4 matches in the positions 13-16, 
whereas moderate-stringent-seeds require 4-5 matches in the 
positions 13-19. Sites with this additional 3' pairing are called 
3'-supplementary (Fig. Ic) and 3'-compensatory sites [7]. 

It is difficult to measure the efficacy level of each seed type 
precisely, but several microarray and conservation enrichment 
studies have revealed hierarchies of relative efficacies. These 
hierarchies can be described as Stringent seed > Stringent seed 
in offset > Moderate-stringent seed > Moderate-stringent in 
offset; 8mer > 7mer-m8 > 7mer-Al > 6mer in the stringent- 
seed types; and Bulge > G:U wobble > Loop in the moderate- 
stringent-seed types [7, 24]. Moreover, multiple sites are more 
efficient than single sites [25]. 

The advantages and disadvantages of using different set 
of seed types are that considering only stringent-seed types 
increases specificity but might miss many potential targets, 
whereas considering both stringent and moderate-stringent- 
seed types increase sensitivity but might also increase the num¬ 
ber of false positives. 


2.2. Site location: most target sites reside within 3' untrans¬ 
lated region (UTR) of target genes 

Several studies have reported that most target sites can 
be found in the 3' UTR segment of the target genes, even 
though miRNA-loaded RISC in theory can bind any segment 
of mRNA. Target genes tend to have longer 3' UTR, whereas 
ubiquitously expressed genes, such as house-keeping genes, 
have shorter 3' UTR - potentially to avoid being regulated by 
miRNAs [26]. Target sites are not evenly distributed within 
3' UTR, but are located near both ends when the length of 3' 
UTR is >2000 nucleotides. For shorter 3' UTRs, sites tend to 
be near the stop codon [23]. Sites are not located too close 
to the stop codon, however, but 15-20 nucleotides away from 
the stop codon [21]. In addition, some genes have alternative 
splicing in their 3' UTR segments, especially genes with long 
3' UTRs. These genes might therefore have different potential 
target sites for alternatively spliced 3' UTRs [27]. Finally, al¬ 
ternative polyadenylation sites can shorten 3' UTRs and affect 
miRNA regulation [28]. 

Although functional miRNA sites are preferentially located 
in 3' UTR, seed sites in the coding sequence (CDS) and 5' UTR 
regions can also give downregulation [29, 30]. Why does RISC 
then appear to prefer the 3' UTR? The most probable explana¬ 
tion is that RISC competes with other protein complexes, such 
as ribosomes in CDS and translation initiation complexes in 5' 
UTR; see discussion in the following section ‘Multiple sites: 
cooperativity enhances site efficacy’. The 3' UTR might sim¬ 
ply be more accessible for long-term binding than the two other 
mRNA regions [5]. 

Despite this general trend for 3' UTR targeting, there are 
some notable exceptions. One recent study reported that many 
miRNAs preferentially target 5' UTR sites with high comple¬ 
mentarity to the miRNAs’ 3' end in a species-specific manner 
[31]. The targets also showed signs of weaker interactions be¬ 
tween the miRNA seed sequence and the 3' UTR. The authors 
proposed that these sites represented a new miRNA target class 
called ‘miBridge’, in which one miRNA simultaneously inter¬ 
acts with a seed pairing site in 3' UTR and a 3' pairing site in 
5' UTR. The molecular mechanisms behind and the biological 
extent of these miBridge targets are still unknown, however. 

Most miRNA target prediction studies only focus on the 3' 
UTR, which results in that all the available data are biased to¬ 
ward 3' UTR. Moreover, few studies consider alternative splic¬ 
ing or polyadenylation because of shortcomings in current an¬ 
notations. As transcript usage often depends on cellular context 
- for example, whether the cell is proliferating or terminally 
differentiated - future tools for miRNA target analyses should 
probable use available information about cellular state to in¬ 
crease prediction performance. 

2.3. Conservation: miRNAs and their targets are conserved 
among related species 

miRNA families are comprised of miRNAs that have the 
same seed site, and are well conserved among related species. 
In addition, miRNA families have targets that are conserved 
among related species [9]. There are also species-specific miR¬ 
NAs and targets, and one study showed that about 30% of the 
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experimentally validated target genes might not be well con¬ 
served [32]. 

siRNA ofF-target effects occur no matter whether the site is 
conserved or not [33], therefore searching for all potential tar¬ 
get sequences without considering evolutionally conservational 
might increase siRNA off-target detection efficacy. 

Applying a hlter that requires predicted target sites to be con¬ 
served can decrease the false-positive rate, but such a hlter is ef¬ 
fective only for conserved miRNAs. It is important to identify 
targets both with and without conservation - especially when 
species-specihc miRNAs or siRNA off-targets is of interest. 

2.4. Site accessibility: mRNA secondary structure affects site 
accessibility 

The mRNA secondary structure is very important for miRNA 
targeting. An effective miRNA:mRNA interaction needs an 
open structure on the target site to begin the hybridization reac¬ 
tion. After binding, RISC can disrupt the secondary structure 
on the site to elongate hybridization [34, 35]. Minimum free 
energy is usually used to estimate the secondary structure and 
RNA hybridization, but the amount of A:Us surrounding the 
site can also be used to estimate the site accessibility. Effective 
target sites often have A:U rich context in approximately 30 
nucleotides upstream and downstream from the seed matching 
region of the target site [21]. 

Calculating the minimum free energy of accessibility and hy¬ 
bridization with the mRNA secondary structure requires ana¬ 
lyzing different mRNA folding patterns. This requires enor¬ 
mous amounts of computing power, as hnding the most stable 
RNA structure is a computational problem that scales with the 
cube of the length of the RNA sequence [36]. Hence, hnding 
hybridization sites in long 3' UTRs tends to be time consum¬ 
ing. Moreover, the current thermodynamic models used in RNA 
secondary structure prediction algorithms are only 90-95% ac¬ 
curate, which results in that the algorithms tend to have only 
50-70% of the base pairs correct [36]. Thus, despite being the¬ 
oretically sound, calculating site accessibility has limited prac¬ 
tical value when predicting miRNA target sites; heuristics that 
are easy to compute, such as local A:U context, have similar 
performance. 

2.5. Multiple sites: cooperativity enhances site efficacy 

Strong miRNA targets tend to have multiple target sites in¬ 
stead of one single site [37]. Considering the number of pu¬ 
tative miRNA sites per mRNA can therefore signihcantly en¬ 
hance target prediction. 

Although the general effect of multiple sites appears to be 
additive, miRNA targeting can also be synergistic. Our previ¬ 
ous study showed that two target sites within optimal distance 
enhance target site efficacy. The preferable optimal length is 
between 17 and 35 nucleotides, but the length between 14 and 
46 nucleotides also enhances the efficacy (Fig. Id). This co¬ 
operability is functional between the same miRNAs as well as 
two different miRNAs [25]. Multiple sites involving more than 
two sites can also contribute to the enhancement [38]. 

The exact mechanism underlying the synergism remains un¬ 
known. As translational suppression is a relatively slow process 


compared with RISCs catalytic cleavage [39], however, multi¬ 
ple RISC complexes bound at closely spaced target sites might 
cooperatively stabilize each other at the sites or possibly accel¬ 
erate the regulatory process. This could explain why miRNAs 
prefer targets in 3' UTRs, as ribosomes would displace RISC 
from sites in CDS before RISC could effect translational sup¬ 
pression. Indeed, a cluster of rare codons that stall the ribo¬ 
some can, when placed in front of a nonfunctional miRNA site 
in CDS, change the site to a functional site [40]. Moreover, the 
genes that currently have verihed miRNA target sites in CDS 
tend to have either one very strong target site [41, 42] or multi¬ 
ple, closely spaced sites [43, 44]. 

2.6. Expression profile: miRNA:mRNA pairs are negatively 
correlated in expression profiles 

One miRNA can potentially regulate many genes; therefore, 
expression prohles of mRNAs might vary substantially depend¬ 
ing on the miRNA expression levels. Many miRNAs are also 
expressed differently in different tissues. Consequently, if nega¬ 
tively correlated expression levels of a miRNA:mRNA pair are 
detected across different tissue prohles, the mRNA of the pair 
is probably targeted by the miRNA [45, 46]. Filtering putative 
targets based on expression prohle correlations is an effective 
approach to reduce the false-positive rate. Although the major¬ 
ity of miRNA targets appear to be regulated both at the mRNA 
and protein level, some targets only show an effect at the pro¬ 
tein level, however [47, 48]. Researchers should therefore be 
aware that such hltering will exclude potential targets. 

3. Target prediction tools 

Many target prediction tools have been developed (Table 1), 
but the types of methods applied, the miRNA and mRNA se¬ 
quences used and the output prediction data and performance 
evaluation vary widely between tools. Direct comparison of 
prediction performance among tools is not straight forward be¬ 
cause the set of predicted target genes from different tools do 
not overlap well. What is clear, however, is that conventional 
tools with simple stringent-seed search are prone to high false¬ 
positive rates. Therefore, most tools are designed to reduce the 
false-positive rate and maximize the accuracy at the same time. 
We have compared the currently available tools based on the 
target features the tools use in their predictions (Table 1), and 
the tools availability (Table 2). Availability is especially impor¬ 
tant for researchers that are using their own miRNA or mRNA 
annotations, or are working in a nonstandard species. In these 
cases, only tools that can be downloaded or allows the user to 
input own miRNAs and mRNAs can be used. 

Most tools rely on either one or a combination of seed match¬ 
ing, site accessibility and evolutionary conservation features, 
although some recently developed tools use expression prohles. 
No tools have successfully incorporated some of the important 
features, such as optimal distances of multiple miRNA sites or 
supplementary sites in CDS and 5' UTR. 

TargetScan [9, 21, 22], PicTar [49-52] , miRanda [37, 53], 
RNAhybrid [56, 57] and PITA [35] have been frequently used 
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Table 1: List of miRNA target prediction tools 


Tool 

Pair^ 

Site'’ 

Consv‘’ 

Access‘' 

MultP 

Expr* 

Refs 

TargetScan 

O 

• 

• 

o 

O 


[9,21,22] 

PicTar 

• 


O 

• 

• 


[49-52] 

miRanda 

• 


O 

• 

o 


[37, 53] 

MicroCosm Targets 

• 


o 

• 

o 


[17, 54, 55] 

RNAhybrid 

• 



• 



[56, 57] 

PITA 

• 


• 

• 

o 


[35] 

STarMir 

• 



• 



[34] 

Rajewsky & Socci 

• 



• 



[19] 

Robins 

• 



• 

o 


[58] 

mirWIP 

• 


o 

• 

o 

• 

[24] 

Microinspector 

• 



• 



[59] 

MicroTar 

• 



• 



[60] 

MirTarget2 

O 

• 

• 

• 



[61] 

miTarget 

• 



• 



[62] 

TargetMiner 

• 


o 

• 


• 

[63] 

EIMMo 

• 


o 


o 


[23] 

NbmiRTar 

• 


o 

• 



[64] 

TargetBoost 

• 






[65] 

RNA22 

• 


o 

• 

• 


[66] 

TargetRank 

o 


• 

o 



[67] 

EMBL 

• 


o 

• 

o 


[18][26][68] 

MovingTarget 

• 


o 

• 

o 


[69] 

DIANA-microT 

• 


o 

• 



[70] 

HOCTAR 

• 


o 

• 


• 

[71] 

Stanhope 






• 

[72] 

GenMiR-i-i- 

o 


o 



• 

[73] 

HuMiTar 

• 






[74] 

MirTif 

• 






[75] 

Yan et al. 

• 


o 

• 



[76] 

Xie et al. 

o 


o 




[77] 


^miRNAimRNA pairing. •: stringent seeds, o: moderately stringent seeds, Blank; seed sites not considered. 

'’Site location. •; target positions considered. Blank: target positions not considered. 

‘^Conservation. •; with/without conservation filter, o; with conservation filter. Blank: conservation not considered. 

‘'Site accessibility. •: site accessibility with minimum free energy considered, o: A;U rich flanking considered. Blank: site 
accessibility not considered. 

‘’Multiple sites. •: multiple sites considered, o: the number of putative sites consided. Blank: multiple co-operability not 
considered. 

'Expression profile. •: expression profiles used. Blank: expression profiles not used. 


for performance comparisons or as preprocessors for other tools 
to obtain initial putative target sites. Of the five, TargetScan 
often shows the best performance in comparisons. TargetScan 
considers only stringent seeds, however, and therefore ignores 
many potential targets. 

4. Summary 

Finding true functional miRNA targets is still challenging 
even though many biological features of miRNA targeting have 
been revealed experimentally and computationally. Building a 
model with more features might achieve higher accuracy and 
enhance site recognition efficacy, but its implementation might 


also become more complex. None of the existing prediction 
tools has been able to incorporate all currently known features. 
We expect that a new approach that can combine the features 
from the six categories we have shown will significantly im¬ 
prove computational miRNA target prediction. 

Another important problem that has hardly been addressed is 
predicting target interactions between different miRNAs. Dif¬ 
ferent miRNAs can cooperatively regulate individual targets, 
but miRNA expression signatures differ between cell types and 
cellular conditions. Determining how varying miRNA expres¬ 
sion affects target regulation in cancerous versus normal cells, 
for example, will therefore be a major problem in the coming 
years. 
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Table 2: Resource availability for miRNA target prediction tools 


Tool 

Predicted 

species^* 

Web access 

Online Own Own 
tool miRNA'’ mRNA‘’ 

SW'* 

URL 

TargetScan 

23 verte- 

Yes 

Yes 

Yes 

Yes 

http;//www. targetscan.org 

PicTar 

brates, f, 

w 

V, m, f, w 

Yes 

No 

No 

No 

http;//pictar.mdc-berlin.de 

miRanda 

h, m, r 

Yes 

No 

No 

Yes 

http://www.microrna.org 

MicroCosm 

44 

Yes 

No 

No 

No 

http://www.ebi.ac.uk/enright-srv/microcosm/htdocs/targets/v5 

Targets 

RNAhybrid 

species 

No 

No 

No 

Yes 

http://bibiserv.techfak.uni-bielefeld.de/rnahybrid 

PITA 

h, m, f. 

Yes 

Yes 

Yes 

Yes 

http://genie.weizmann.ac.il/pubs/mir07 

STarMir 


Yes 

Yes 

Yes 

No 

http;//sfold. wadsworth.org/starmir.pl 

Rajewsky & 

f 

No 

No 

No 

No 


Socci 

Robins 

f, w 

No 

No 

No 

No 


mirWIP 

w 

Yes 

No 

No 

Yes 

http;//146.189.76.171/query.php 

Microinspector 


Yes 

Yes 

Yes 

No 

http://mirna.imbb.forth.gr/microinspector 

MicroTar 


No 

No 

No 

Yes 

http://tiger.dbs.nus.edu.sg/microtar 

MirTarget2 

h, m, r, 
d, c 

Yes 

No 

No 

No 

http://mirdb.org 

miTarget 

Yes 

Yes 

Yes 

No 

http://cbit.snu.ac.kr/~miTarget 

TargetMiner 

h 

Yes 

No 

No 

No 

http;//www. isical. ac .in/ ~bioinfo_miu 

EIMMo 

h, m, f, z 

Yes 

No 

Yes 

No 

http;//www.mirz.unibas.ch/ElMMo2 

NBmiRTar 


Yes 

Yes 

Yes 

No 

http;//wotan. wistar.upenn.edu/NBmiRTar 

TargetBoost 

w 

Yes 

Yes 

No 

No 

https;//demol.interagon.com/targetboost 

RNA22 


Yes 

Yes 

Yes 

No 

http;//cbcsrv. watson.ibm.com/ma22.html 

TargetRank 

h, m 

Yes 

No 

No 

No 

http://hollywood.mit.edu/targetrank 

EMBL 

f 

Yes 

No 

No 

No 

http://www.russell.embl-heidelberg.de/miRNAs 

MovingTarget 

f 

No 

No 

No 

No 


DIANA- 


Yes 

Yes 

Yes 

No 

http://diana.pcbi.upenn.edu/cgi-bin/microJ.cgi 

microT 

HOCTAR 

h 

Yes 

No 

No 

No 

http://hoctar.tigem.it 

Stanhope 

h 

No 

No 

No 

No 


GenMiR-i-i- 

h 

No 

No 

No 

Yes 

http;//www.psi.toronto.edu/genmir/ 

HuMiTar 

h 

No 

No 

No 

No 


MirTif 


Yes 

Yes 

Yes 

No 

http://bsal.ym.edu.tw/mirtif 

Yan et al. 

h 

No 

No 

No 

No 


Xie et al. 

h, m, r, d 

No 

No 

No 

No 



^Both species of pre-computed prediction and the species available on the web tool are listed. Letters indicate the species; fly (f), 
worm (w), human (h), mouse (m), rat (r), chicken (c), zebra fish (z), and dog (d). Cells are left empty when no information is 
available. 

*’Yes/No indicate whether own miRNA sequences can be used on the web interface or not. 

‘^Yes/No indicate whether own mRNA sequences can be used on the web interface or not. 

‘*SW: Software availability (executable or source code). 
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